12 research outputs found

    Penalized model-based clustering with cluster-specific diagonal covariance matrices and grouped variables

    Full text link
    Clustering analysis is one of the most widely used statistical tools in many emerging areas such as microarray data analysis. For microarray and other high-dimensional data, the presence of many noise variables may mask underlying clustering structures. Hence removing noise variables via variable selection is necessary. For simultaneous variable selection and parameter estimation, existing penalized likelihood approaches in model-based clustering analysis all assume a common diagonal covariance matrix across clusters, which however may not hold in practice. To analyze high-dimensional data, particularly those with relatively low sample sizes, this article introduces a novel approach that shrinks the variances together with means, in a more general situation with cluster-specific (diagonal) covariance matrices. Furthermore, selection of grouped variables via inclusion or exclusion of a group of variables altogether is permitted by a specific form of penalty, which facilitates incorporating subject-matter knowledge, such as gene functions in clustering microarray samples for disease subtype discovery. For implementation, EM algorithms are derived for parameter estimation, in which the M-steps clearly demonstrate the effects of shrinkage and thresholding. Numerical examples, including an application to acute leukemia subtype discovery with microarray gene expression data, are provided to demonstrate the utility and advantage of the proposed method.Comment: Published in at http://dx.doi.org/10.1214/08-EJS194 the Electronic Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Variable selection in penalized model-based clustering via regularization on grouped parameters

    No full text
    Summary: Penalized model-based clustering has been proposed for high-dimensional but small sample-sized data, such as arising from genomic studies; in particular, it can be used for variable selection. A new regularization scheme is proposed to group together multiple parameters of the same variable across clusters, which is shown both analytically and numerically to be more effective than the conventional L1 penalty for variable selection. In addition, we develop a strategy to combine this grouping scheme with grouping structured variables. Simulation studies and applications to microarray gene expression data for cancer subtype discovery demonstrate the advantage of the new proposal over several existing approaches

    Functional group-based linkage analysis of gene expression trait loci-3

    No full text
    Phosphoinositide-mediated signaling) and 5 (regulation of cyclin dependent protein kinase activity).<p><b>Copyright information:</b></p><p>Taken from "Functional group-based linkage analysis of gene expression trait loci"</p><p>http://www.biomedcentral.com/1753-6561/1/S1/S117</p><p>BMC Proceedings 2007;1(Suppl 1):S117-S117.</p><p>Published online 18 Dec 2007</p><p>PMCID:PMC2367612.</p><p></p

    Functional group-based linkage analysis of gene expression trait loci-0

    No full text
    Ed signaling; 3) GTP biosynthesis; 4) purine nucleotide biosynthesis; 5) regulation of cyclin dependent protein kinase activity; 6) meiosis; 7) mRNA-nucleus export; 8) cholesterol metabolism; 9) biosynthesis; and 10) epidermis development.<p><b>Copyright information:</b></p><p>Taken from "Functional group-based linkage analysis of gene expression trait loci"</p><p>http://www.biomedcentral.com/1753-6561/1/S1/S117</p><p>BMC Proceedings 2007;1(Suppl 1):S117-S117.</p><p>Published online 18 Dec 2007</p><p>PMCID:PMC2367612.</p><p></p

    Functional group-based linkage analysis of gene expression trait loci-4

    No full text
    Ed signaling; 3) GTP biosynthesis; 4) purine nucleotide biosynthesis; 5) regulation of cyclin dependent protein kinase activity; 6) meiosis; 7) mRNA-nucleus export; 8) cholesterol metabolism; 9) biosynthesis; and 10) epidermis development.<p><b>Copyright information:</b></p><p>Taken from "Functional group-based linkage analysis of gene expression trait loci"</p><p>http://www.biomedcentral.com/1753-6561/1/S1/S117</p><p>BMC Proceedings 2007;1(Suppl 1):S117-S117.</p><p>Published online 18 Dec 2007</p><p>PMCID:PMC2367612.</p><p></p

    Pairwise correlations of the ten functional groups with highest mean heritability

    No full text
    <p><b>Copyright information:</b></p><p>Taken from "Functional group-based linkage analysis of gene expression trait loci"</p><p>http://www.biomedcentral.com/1753-6561/1/S1/S117</p><p>BMC Proceedings 2007;1(Suppl 1):S117-S117.</p><p>Published online 18 Dec 2007</p><p>PMCID:PMC2367612.</p><p></p

    Penalized mixtures of factor analyzers with application to clustering high-dimensional microarray data

    No full text
    Motivation: Model-based clustering has been widely used, e.g. in microarray data analysis. Since for high-dimensional data variable selection is necessary, several penalized model-based clustering methods have been proposed tørealize simultaneous variable selection and clustering. However, the existing methods all assume that the variables are independent with the use of diagonal covariance matrices
    corecore